Cancer Epidemiology, Biomarkers & Prevention — Latest Matching Preprints

1

Impact of surveillance colonoscopy on colorectal cancer incidence and mortality in Lynch syndrome - a national observational cohort study of patients in the English NHS 2010-2022

Huntley, C.; Loong, L.; Mallinson, C.; Rahman, T.; Torr, B.; Allen, S.; Allen, I.; Hassan, H.; Fru, Y. W. J.; Tataru, D.; Paley, L.; Vernon, S.; Houlston, R.; Muller, D.; Lalloo, F.; Shaw, A.; Burn, J.; Morris, E.; Tischkowitz, M.; Antoniou, A. C.; Pharoah, P. D. P.; Monahan, K.; Hardy, S.; Turnbull, C.

2026-04-22 oncology 10.64898/2026.04.16.26351020 medRxiv

Top 0.1%

12.2%

Show abstract

BackgroundLynch syndrome (LS) is a cancer susceptibility syndrome caused by germline pathogenic variants in DNA mismatch repair (MMR) genes. Due to increased risk of colorectal cancer (CRC), enhanced colonoscopic surveillance is recommended for heterozygote MMR-carriers. ObjectiveUsing a registry of English LS patients linked to digital National Health Service records, we aimed to assess adherence of MMR-carriers to national surveillance guidelines, and to determine the impact of surveillance on CRC incidence and mortality. DesignWe described the frequency of colonoscopies in 4,732 MMR-carriers and used logistic regression to determine predictors of surveillance adherence. For MMR-carriers with a record of surveillance and those without, we: estimated age-specific annual CRC incidence rates (AS-AIRs) and cumulative lifetime risks, assessed for stage-shift by comparing CRC stage distributions and stage-specific AS-AIRs, and estimated risks of death from CRC and any cause using Kaplan-Meier methods and Cox Proportional Hazards regression. ResultsSurveillance at a mean interval of [≤] 3 years (n=3028) was associated with a decrease in CRC-specific and all-cause mortality, without an associated change in total CRC incidence, even after multivariate adjustment. No strong evidence of stage-shift was observed. Colonoscopic surveillance at a mean interval of [≤] 2 years (n=1569) was associated with an increase in total CRC incidence. Incidence of early-stage cancers was also higher, with no corresponding decrease in late-stage cancers, which may reflect the short follow-up period or the impact of overdiagnosis. ConclusionThe observed reduction in all-cause mortality amongst regularly-surveilled MMR-carriers may indicate an impact of surveillance on CRC-specific mortality, though in the context of a non-randomised study likely reflects the influence of selection bias. KEY MESSAGES OF ARTICLEO_ST_ABSWhat is already known on this topicC_ST_ABSRegular surveillance colonoscopy is recommended in Lynch syndrome, though evidence to support this remains mixed. We searched PubMed for articles published from inception to 01/05/2024 using the terms "Lynch syndrome", "HNPCC", "colonoscopy", "sigmoidoscopy", "surveillance", and "screening". We found one controlled trial and several small analytical studies dating from the early 2000s which compared surveilled and non-surveilled populations and found surveillance to be associated with reduced colorectal cancer (CRC) incidence and improved survival. More recent longitudinal observational studies, most without comparator groups, found a high incidence of CRC in LS populations despite being resident in countries where surveillance was recommended. A small number of studies directly assessed time since last colonoscopy against CRC incidence and stage with mixed findings. Finally, cross-sectional comparisons between countries of CRC incidence rates and surveillance interval recommendations found no relationship between the two1,2. What this study addsHere, we conduct an observational cohort study on a large national cohort of MMR germline pathogenic variant (GPV) carriers (MMR-carriers) in England (n=4,732), comparing CRC incidence and mortality in individuals with a record of regular surveillance to those without. Through linkage of the English National Lynch Syndrome Registry to Hospital Episodes Statistics data, we are uniquely able to study a comprehensive national population of MMR-carriers and identify the dates on which colonoscopies were undertaken over time, allowing assessment of adherence to national surveillance guidelines and the impact this has on CRC outcomes. Notably, receipt of regular colonoscopy was strongly associated with deprivation as well as ethnicity. The results show that regular surveillance at an average interval of 3 years (or less) is not associated with a reduction in CRC incidence when compared to less frequent surveillance, but an apparent decrease in both CRC-specific and overall mortality is observed, even after adjustment for confounding variables. Conversely, regular surveillance at an average interval of 2 years (or less) is associated with an increase in CRC incidence when compared to less frequent surveillance, which may suggest increased diagnosis of early-stage cancers or, due to the absence of a reduction in late-stage cancers, overdiagnosis. The observed impact of surveillance on overall mortality may demonstrate the impact of surveillance on CRC-specific mortality, or, in the context of an observational (non-randomised) study, indicate that the results are subject to selection bias. How this study might affect research, practice, or policyEvidence for the benefit of surveillance colonoscopy remains mixed. Whilst polypectomy would be anticipated to prevent CRC development (thus reducing CRC incidence), several studies have observed increased frequency of CRCs in MMR-carriers undergoing frequent surveillance colonoscopy, which may reflect overdiagnosis. The selection bias inherent to observational studies of surveillance renders mortality outcomes challenging to interpret. Randomised controlled trials of colonoscopic surveillance in MMR-carriers are required for effectiveness of this intervention to be accurately assessed. Given ethical and feasibility challenges, randomised controlled trials might be complemented by quasi-experimental designs using advanced observational methods for assessing effectiveness.

2

Comparative fine-mapping of breast cancer susceptibility loci using summary statistics methods and multinomial regression

O'Mahony, D. G.; Beasley, J.; Zanti, M.; Dennis, J.; Dutta, D.; Kraft, P.; Kristensen, V.; Chenevix-Trench, G.; Easton, D. F.; Michailidou, K.

2026-04-22 epidemiology 10.64898/2026.04.21.26351364 medRxiv

Top 0.1%

10.0%

Show abstract

Summary statistics fine-mapping methods offer advantages over classical methods, including avoiding data-sharing constraints and improved modelling of correlated variables and sparse effects. However, its performance has not been comprehensively evaluated in breast cancer using real-world data. Previous multinomial stepwise regression (MNR) fine-mapping analyses for breast cancer identified 196 credible sets. Here, we apply summary statistics fine-mapping, compare methods, and assess parameters influencing performance. Using summary statistics from the Breast Cancer Association Consortium, we compared finiMOM, SuSiE, and FINEMAP to published MNR results across 129 regions. Performance was assessed by recall using in-sample and out-of-sample LD. Discordant credible sets were examined for technical factors, and target genes were defined using the INQUISIT pipeline. SuSiE showed the closest agreement with MNR. Results varied across regions depending on the assumed number of causal variants (L), with higher values reducing recall and no single L maximising performance. At optimal L per region, SuSiE identified 8,192 CCVs in 244 credible sets, with recall of 88%, 86%, and 72% for overall, ER-positive, and ER-negative breast cancer. Thirty MNR sets were missed. Discordance was partially explained by allele flips, imputation quality, and array heterogeneity. Fifty-two MNR-identified genes, including BRCA2, WNT7B and CREBBP were not recovered, while additional candidate genes were identified. Using out-of-sample LD reduced recall by 3% but identified novel variants. Fine-mapping results vary across methods, and no single approach is sufficient. The choice of L strongly influences results, and combining analytical approaches with functional validation can improve causal variant identification.

3

Novel Genetic Risk Loci for Pancreatic Ductal Adenocarcinoma Identified in a Genome-wide Study of African Ancestry Individuals

Vergara, C.; Ni, Z.; Zhong, J.; McKean, D.; Connelly, K. E.; Antwi, S. O.; Arslan, A. A.; Bracci, P. M.; Du, M.; Gallinger, S.; Genkinger, J.; Haiman, C. A.; Hassan, M.; Hung, R. J.; Huff, C.; Kooperberg, C.; Kastrinos, F.; LeMarchand, L.; Lee, W.; Lynch, S. M.; Moore, S. C.; Oberg, A. L.; Park, M. A.; Permuth, J. B.; Risch, H. A.; Scheet, P.; Schwartz, A.; Shu, X.-O.; Stolzenberg-Solomon, R. Z.; Wolpin, B. M.; Zheng, W.; Albanes, D.; Andreotti, G.; Bamlet, W. R.; Beane-Freeman, L.; Berndt, S. I.; Brennan, P.; Buring, J. E.; Cabrera-Castro, N.; Campa, D.; Canzian, F.; Chanock, S. J.; Chen, Y.;

2026-04-22 genetic and genomic medicine 10.64898/2026.04.21.26351329 medRxiv

Top 0.1%

6.8%

Show abstract

Pancreatic cancer disproportionately affects Black individuals in the United States, but they have limited representation in genetic studies of pancreatic ductal adenocarcinoma (PDAC). To address this gap, we performed admixture mapping and genome-wide association analysis (GWAS) in genetically inferred African ancestry individuals (1,030 cases and 889 controls). Admixture mapping identified three regions with a significantly higher proportion of African ancestry in cases compared to controls (5q33.3, 10p1, 22q12.3). GWAS identified a genome-wide significant association at 5p15.33 (CLPTM1L, rs383009:T>C, T Allele Frequency=0.51, OR:1.45, P value=1.24x10-8), a locus previously associated with PDAC. Known loci at 5p15.33, 7q32.3, 8q24.21 and 7q25.1 also replicated (P value <0.01). Multi-ancestral fine-mapping identified two potential causal SNPs (rs3830069 and rs2735940) at 5p15.33. Collectively these findings identified novel PDAC risk loci and expanded our understanding of this deadly cancer in underrepresented populations, emphasizing the multifactorial nature of PDAC risk including inherited genetic and non-genetic factors. Statement of SignificanceTo understand how genetic variation contributes to PDAC risk in Black people in North American, we studied individuals of genetically-inferred African ancestry. We identified novel risk loci and differences in the contribution of known loci. This demonstrates that ancestry-informed genetic analyses improve our understanding of PDAC risk and enhances discovery.

4

Assessing potential harms from screening overdiagnosis and false positives with multicancer early detection tests

Malagon, T.; Russell, W. A.; Burnier, J. V.; Dickinson, K.; Brenner, D.

2026-04-13 oncology 10.64898/2026.04.09.26348927 medRxiv

Top 0.1%

4.9%

Show abstract

BackgroundMulticancer early detection tests could be used for cancer screening, but may lead to harms, including false positive results and overdiagnosis of indolent tumours that would not have become clinically evident during that persons lifetime. We assessed the potential for these screening harms in the context of future population-based screening with a multicancer early detection test. MethodsWe used a microsimulation model to assess potential population-level impacts of screening at ages 50-75 years with a multicancer early detection test in Canada. We assumed high test specificity (97-99.1%) and test sensitivity increasing with cancer stage. The model includes latent indolent cancers that would not be diagnosed within that persons lifetime but can be overdiagnosed through screen-detection. We calculated the yearly and cumulative lifetime probabilities of screening overdiagnosis and false positive test results, assuming a range of preclinical screen-detectable periods (2-5 years). ResultsAn estimated 2.1-6.0% of all yearly screen-detected cancers with a multicancer screening test were predicted to be overdiagnoses across scenarios. The proportion of overdiagnosis varied by site, and strongly increased with age, going from 1% at age 50 to over 10% of screen-detected cancers by age 75. The test positive predictive value ranged from 15.9%-77.6%, meaning that there could be 0.3-5.3 false positives with no underlying cancer for every true cancer case detected by the test. ConclusionPopulation-level multicancer screening with a multicancer early detection test would likely not lead to substantial screen-related overdiagnosis. Healthcare systems should consider how screening false positives may increase their diagnostic service caseload.

5

Quantitative and qualitative patient-reported analysis of misdiagnosis and/or late diagnosis of metastatic lobular cancer

Cody, M. E.; Chang, H.-C.; Foldi, J.; Jankowitz, R. C.; Balic, M.; Cushing, T.; Donnelly, C.; Freeney, S.; Levine, J.; Petitti, L.; Ryan, N.; Spencer, K.; Turner, C.; Tseng, G. C.; Desmedt, C.; Oesterreich, S.; Lee, A. V.

2026-04-20 oncology 10.64898/2026.04.16.26348799 medRxiv

Top 0.1%

4.7%

Show abstract

BackgroundInvasive lobular breast cancer (ILC) is the most commonly diagnosed special histological subtype of breast cancer (BC). Metastatic ILC (mILC) is less sensitive to FDG-PET imaging and often metastasizes to unusual sites --peritoneum, gastrointestinal (GI) tract, ovaries, urinary tract, and orbit--which may go unrecognized after a long disease-free interval. Some metastatic sites cause nonspecific symptoms, like abdominal/epigastric pain, with numerous published case reports of mILC misdiagnosed as gastric cancer. These atypical BC metastatic sites may lead to late and/or misdiagnosis, thereby delaying effective treatments. ObjectiveWe developed a patient survey to investigate the patient-reported prevalence of delayed diagnosis or misdiagnosis of mILC and their potential impact upon treatment outcomes. MethodsA 45-question survey was developed and piloted with breast cancer researchers, clinical oncologists, and patient advocates. This IRB-approved survey was then distributed to patients with ILC. Analyses including data QC and visualization were conducted in R using descriptive statistics. Incomplete or inconsistent responses were excluded, and summary statistics were stratified by four common mILC sites to highlight subgroup differences. Results525 patient surveys were completed, with 450 patients diagnosed with ILC, and of those 321 diagnosed with mILC. For those with mILC, 33.3% (n=107) were diagnosed with de novo mILC at initial presentation. Of the patients diagnosed with mILC, 32.1% (n=103) presented with other medical conditions at diagnosis. Misdiagnosis was reported by 26.2% (n=84) of patients with mILC, and of these cases, 31% (n=26) had [≥]2 misdiagnoses. The top 5 misdiagnoses were bone-related condition (24.7%), benign breast condition (23.4%), another type of BC (7.8%), diagnostic delay (7.8%), and menopause related (5.2%). 44.5% of patients waited [≥]1 year for an accurate diagnosis. 49 patients were treated for their misdiagnosis, and 6 received incorrect cancer treatments. The most frequently reported contributors to delayed or misdiagnosis were inconclusive imaging, providers lack of ILC knowledge, and initial misdiagnosis. Of the 321 patients with mILC, 138 (42.9%) reported symptoms before diagnosis; the most common were back pain (16.5%), fatigue/malaise (14.9%), GI symptoms (11.8%), bloating (8.4%), and weight loss (8.1%). Although 40% of patients reported having a mammogram at the time of their initial misdiagnosis, ILC was detected in only 20.5% (24/116) of these cases, and mammography detected only 5 (25%) of the 20 de novo mILC cases. Patients reported additional diagnostic testing within 1-3 months of their initial mammogram, includingbiopsy, ultrasound (US), and MRI. 47.9% of patients were in active BC surveillance after curative intent therapy at the time of their mILC diagnosis; however, no statistical difference was seen in time to diagnosis versus those patients not under surveillance. ConclusionOur survey results underscore the urgent need to improve diagnostic strategies for mILC. Addressing delays and diagnostic errors in mILC is critical to optimizing treatment strategies and improving patient outcomes.

6

Genomic ascertainment of PALB2-related cancer predisposition

Stewart, D.; Kim, J.; Haley, J. S.; Li, J.; Sargen, M. R.; Hong, H. G.; Tischkowitz, M.; McReynolds, L. J.; Carey, D. J.

2026-04-04 genetic and genomic medicine 10.64898/2026.04.03.26349984 medRxiv

Top 0.1%

4.7%

Show abstract

PURPOSE To evaluate cancer risk, age-specific penetrance, and mortality associated with heterozygous pathogenic or likely pathogenic (P/LP) germline PALB2 variants identified through genomic ascertainment and to assess modification by family history of cancer. PATIENTS AND METHODS We conducted a case-control study in two large population-based adult cohorts: the UK Biobank (n=469,580) and Geisinger MyCode (n=167,050). Individuals with heterozygous PALB2 P/LP variants were identified via exome sequencing and compared with non-carriers. Cancer diagnoses and vital status were obtained from linked registry and electronic health record data. We used multivariable logistic regression to estimate odds ratios (ORs) for cancer outcomes and Cox proportional hazards models to estimate hazard ratios (HRs) for all-cause mortality. Age-specific cumulative incidence (penetrance) was estimated using Kaplan-Meier methods. Models were adjusted for birth year, sex (when applicable), smoking status, and body mass index; stratified analyses assessed modification by family history of cancer. RESULTS PALB2 P/LP variant prevalence was 1:571 in UK Biobank and 1:940 in MyCode, with the higher prevalence in the UK cohort driven by the PALB2 p.Trp1038Ter founder variant. Compared with non-carriers, heterozygotes had significantly increased odds of any cancer, female breast cancer, pancreatic cancer, and cancers of ill-defined or secondary sites in both cohorts (P < 0.01). Adjusted hazard ratios for any cancer and female breast cancer ranged from 1.7 to 3.6. All-cause mortality was increased among PALB2-heterozygotes (HR 1.61-1.67), and survival after cancer diagnosis was reduced. Family history further modified cancer risk. CONCLUSION Genomic ascertainment of PALB2-heterozygotes identifies elevated risk for multiple cancers and increased mortality, although risks were lower than estimates from familial ascertainment. These findings inform risk management for individuals identified through genomic screening.

7

Comparing Gleason Pattern 4 Measurement Approaches on Prostate Biopsy Using Machine Learning: A Proof-of-Principle Study

Buzoianu, M. M.; Yu, R.; Assel, M.; Bozkurt, A.; Aghdam, H.; Fine, S.; Vickers, A.

2026-04-24 oncology 10.64898/2026.04.23.26351615 medRxiv

Top 0.1%

4.4%

Show abstract

Objective: To demonstrate the proof of principle that machine learning (ML) can be used to quantify Gleason Pattern (GP) 4 on digitized biopsy slides using multiple measurement approaches, allowing direct comparison of their prognostic performance. Methods: We assembled a convenience sample of 726 patients with grade group 2-4 prostate cancer on systematic biopsy who underwent radical prostatectomy between 2014 and 2023. Digitized biopsy slides were analyzed using a machine-learning algorithm (PAIGE-AI) to quantify GP4 using multiple measurement approaches, particularly with respect to how gaps between cancer foci (interfocal stroma) were handled. GP4 extent was quantified using linear measurements or a pixel-based area metric. Discrimination of each GP4 quantification approach, along with Grade Group (GG), was assessed for adverse radical prostatectomy pathology and biochemical recurrence. Results: We identified 15 different quantification approaches and observed differences between their discrimination. The highest discrimination was in the pixel-counting method (AUC 0.648). GP4 quantification outperformed GG for predicting adverse pathology (AUC 0.627 vs 0.608). Amount of GP3 was non-predictive once GP4 was known. These findings were consistent for BCR. Conclusions: We were able to measure slides using 15 distinct measurement approaches and replicated prior findings using ML to quantify GP4. Our findings support the use of ML as a research tool to compare different GP4 quantification approaches. We intend to use our method on larger cohorts to determine with which measurement approach best predicts oncologic outcome.

8

Prospective Population-Scale Validation of an Electronic Health Record Based Model for Pancreatic Cancer Risk

Lahtinen, E.; Schigiltchoff, N.; Jia, K.; Kundrot, S.; Palchuk, M. B.; Warnick, J.; Chan, L.; Shigiltchoff, N.; Sawhney, M. S.; Rinard, M.; Appelbaum, L.

2026-04-13 oncology 10.64898/2026.04.11.26350318 medRxiv

Top 0.1%

4.4%

Show abstract

Background and aims: Pancreatic ductal adenocarcinoma (PDAC) surveillance is limited to individuals with familial or genetic risk although most future cases arise outside these groups. In a retrospective study, PRISM, an electronic health record (EHR)-based PDAC risk model, identified individuals in the general population at elevated near-term risk of PDAC. We aimed to prospectively evaluate whether PRISM can identify high-risk individuals beyond current surveillance groups across U.S. health systems. Methods: We performed a prospective multicenter cohort study after deployment of PRISM in April 2023 across 44 U.S. health care organizations. Eligible adults aged [≥]40 years without prior PDAC received a single baseline risk score and were assigned to prespecified risk tiers. Patients were followed for incident PDAC for 30 months. We estimated tier-specific 30-month cumulative incidence (positive predictive value, PPV), number needed to screen (NNS), standardized incidence ratios (SIRs), and time from deployment and first high-risk flag to diagnosis. Results: Among 6,282,123 adults assigned a PRISM score, 5,058,067 had follow-up; 3,609 developed PDAC. The highest-risk tier had 30-fold higher PDAC incidence than the study population. At the SIR 5 threshold, 30-month cumulative incidence was 0.35% (NNS, 284.2); at SIR 16, 1.14% (NNS, 87.4); and at SIR 30, 2.19% (NNS, 45.7). Median time from deployment to PDAC diagnosis was 9.5 months, and median time from first high-risk flag to diagnosis at SIR 5 was 3.5 years. Shapley additive explanations (SHAP) analyses supported patient- and tier-level interpretability. Conclusions: Prospective deployment of PRISM across multiple U.S. health care organizations identified individuals at elevated near-term risk for PDAC, with substantial risk enrichment and lead time before diagnosis. These findings support the real-world scalability and generalizability of EHRbased risk stratification for risk-adapted early detection. ClinicalTrials.gov identifier NCT05973331

9

Weight Trajectories and Cancer Risk: A Pooled Cohort Study

Nilsson, A.; da Silva, M.; Le, H. T.; Haggstrom, C.; Wahlstrom, J.; Michaelsson, K.; Trolle Lagerros, Y.; Sandin, S.; Magnusson, P. K.; Fritz, J.; Stocks, T.

2026-04-24 epidemiology 10.64898/2026.04.23.26351553 medRxiv

Top 0.1%

4.2%

Show abstract

Excess body weight has been associated with increased cancer risk, but the role of weight change across adulthood remains unclear. We examined body weight trajectories from ages 17 to 60 and their associations with site-specific cancer incidence. Data were based on the ODDS study, a pooled, nationwide cohort study in Sweden, with data on weight spanning 1911 to 2020, and cancer follow-up through 2023. Weight trajectories were estimated with linear mixed effects models in individuals with at least three weight measurements. Cox regressions estimated hazard ratios for associations between weight trajectories and established and potentially obesity-related cancers. Fifth versus first quintile of weight change was associated with many cancers, most strongly with esophageal adenocarcinoma in men (HR 2.25; 95% CI 1.66-3.04), liver cancer in men (HR 2.67; 95% CI 2.15-3.33), endometrial cancer in women (HR 3.78; 95% CI 3.09-4.61), and pituitary tumors in both sexes (men: HR 3.13 [95% CI 2.13-4.61]; women: HR 2.13 [95% CI 1.41-3.22]). Associations varied by sex and age. Heavier weight at age 17 years and earlier obesity onset were also associated with higher cancer incidence. These findings highlight the importance of a life-course approach to weight management and support sex- and age-targeted cancer prevention strategies.

10

DNA methylation signatures of mismatch repair-deficient colorectal cancer

Ward, R.; Endicott, M.; Mallabar-Rimmer, B.; Burrage, J.; Sherwood, K.; Huang, Q.; Ward, J. C.; Thorn, S.; Woolley, C.; Wood, S.; Dempster, E.; Green, H. D.; Tomlinson, I.; Webster, A. P.

2026-04-13 cancer biology 10.64898/2026.04.09.717165 medRxiv

Top 0.1%

4.2%

Show abstract

BackgroundColorectal cancer (CRC) is a molecularly heterogeneous disease shaped by both genetic and epigenetic alterations. Approximately 15% of CRCs display widespread CpG island hypermethylation, known as the CpG Island Methylator Phenotype (CIMP). CIMP-high (CIMP-H) tumours frequently exhibit MLH1 promoter hypermethylation, leading to mismatch repair deficiency (MMRd) and microsatellite instability (MSI). However, DNA methylation patterns associated with MSI, independent of CIMP and MLH1 silencing, and the influence of clinical variables such as anatomical location and patient age on the CRC methylome remain poorly characterised. MethodsWe performed epigenome-wide DNA methylation profiling of 259 primary CRC tissue samples using the Illumina EPICv2 array, comparing differential methylation between MSI and microsatellite stable (MSS) CRC, adjusting for tumour purity, MLH1 promoter methylation, CIMP status, and anatomical location, to account for known confounders. We further evaluated the independent effects of anatomical location and patient age on global methylation patterns. ResultsEpigenome-wide differential methylation between MSS and MSI CRC was dominated by MLH1 promoter hypermethylation. After adjusting for MLH1 hypermethylation and CIMP status, we identified a distinct set of 656 CpG sites associated with MMRd independent of MLH1 silencing. These included hypermethylation at LRP6, GSK3{beta}, and CDK12, implicating altered WNT signalling and transcriptional regulation pathways. Comparison of MSI subgroups revealed the co-occurrence of MLH1 hypermethylation with promoter hypermethylation at TXNRD1. Anatomical location showed a strong independent effect on methylation patterns, while we observed only modest effects of patient age on the CRC methylome after adjustment for confounders. ConclusionsWe identified a distinct methylation profile distinguishing MSS and MSI CRC, including MLH1-independent markers of MMRd, as well as novel differentially methylated loci within MSI subgroups. We further showed that anatomical location has a strong independent impact on the CRC methylome. Together, these findings refine the molecular characterisation of CRC and highlight potential epigenetic markers that could inform patient stratification and precision oncology.

11

A catalogue of missense and nonsense mutation abundances for the U.S. cancer patient population

Arun, A.; Liarakos, D.; Mendiratta, G.; McFall, T.; Hargreaves, D. C.; Wahl, G. M.; Hu, J.; Stites, E. C.

2026-04-22 oncology 10.64898/2026.04.20.26351248 medRxiv

Top 0.1%

4.1%

Show abstract

Widespread genomic sequencing efforts have characterized the molecular foundations of the different cancers. By combining these genomic data in a manner proportional to the population-level abundances of these different cancers, we estimate the overall abundances of each observed missense and nonsense mutation within the U.S. cancer patient population. We find BRAF V600E (5.2%) is the most common mutation in the cancer patient population, TP53 R175H (1.5%) is the most common tumor suppressor mutation, and APC R876X (0.4%) is the most common nonsense mutation. These values differ largely and significantly from what would be found in a typical pan-cancer analysis, where different cancer types are included out of proportion to population level incidence. We present the full ordered lists of population-level abundances for specific missense and nonsense mutations, and we demonstrate the value of these data by further analyzing high priority genes (e.g., TP53, KRAS, BRAF) and pathways (e.g., RTK/RAS, PI3K, and WNT/{beta}-catenin). Overall, this information is a resource that should benefit the basic science, translational, and clinical cancer research communities.

12

Clinical and pathological characteristics of thin cutaneous melanomas with rapid recurrence.

Bhave, P.; Wong, T.; Margolin, K.; Hoeijmakers, L.; Mangana, J.; Vitale, M. G.; Ascierto, P. A.; Maurichi, A.; Santinami, M.; Heddle, G.; Allayous, C.; Lebbe, C.; Kattak, A.; Forchhammer, S.; Kessels, J. I.; Lau, P.; Lo, S. N.; Papenfuss, A. A.; McArthur, G. A.

2026-04-06 oncology 10.64898/2026.04.04.26350182 medRxiv

Top 0.1%

3.7%

Show abstract

Background: Although thin, T1 melanomas have an excellent cure rate with surgery alone, >25% of melanoma deaths originate from thin melanomas (TMs). There is, therefore, an urgent need to improve the identification and management of patients with TMs at high risk of recurrence. Methods: Patients with T1 melanoma and recurrence [≤] 2 years of diagnosis (T1 rapid group) were compared to patients with T1 melanoma and recurrence [≥]10 years after diagnosis (T1 late group). Results: 442 patients from 14 sites were included: 310 and 132 patients in the T1 rapid and late groups, respectively. Median age at primary melanoma diagnosis was 51 years [15-85], 272 (62%) male, 254 (58%) superficial spreading and 101 (23%) head/neck primary. The majority (73%) of recurrences in the T1 rapid group were locoregional. Using univariable logistic regression analysis, age >65 years (p<0.0001), lentigo maligna (LM) melanoma subtype (p=0.025), head/neck primary site (p=0.0065), mitoses [≥]1/mm2 (p=0.0181) and ulceration (p=0.0087) were significantly associated with T1 rapid recurrence compared to T1 late recurrence. Using multivariable analysis, age >65 years (p=0.0010), mitoses [≥]1/mm2 (p=0.049) and ulceration (p=0.037) remained significant. Conclusions: Rapid recurrence of TM is associated with age >65 years, LM subtype, head/neck primary site, mitoses [≥]1/mm2 and ulceration.

13

Gut Microbiome as a Diagnostic Biomarker for Early Cancer Detection: A Systematic Review and Meta-Analysis of 18 Studies across Five Cancer Types

TALL, M. l.

2026-04-22 cancer biology 10.64898/2026.04.19.719461 medRxiv

Top 0.2%

3.6%

Show abstract

BackgroundThe gut microbiome has emerged as a promising non-invasive biomarker for early cancer detection. However, evidence remains fragmented across individual studies with limited cross-cancer comparisons. ObjectivesTo systematically evaluate the diagnostic accuracy of gut microbiome-based signatures across five major cancer types: colorectal cancer (CRC), gastric cancer (GC), pancreatic ductal adenocarcinoma (PDAC), hepatocellular carcinoma (HCC), and lung cancer (LC). MethodsWe conducted a systematic literature search in PubMed, Embase, and Web of Science (January 2000 - April 2026), following PRISMA 2020 guidelines. Studies reporting area under the receiver operating characteristic curve (AUC) for microbiome-based cancer classification were included. Pooled AUC estimates were derived using a DerSimonian-Laird random-effects model. Study quality was assessed using the Newcastle-Ottawa Scale (NOS). ResultsEighteen studies (2,587 participants) met inclusion criteria. Pooled AUC values were: CRC 0.785 (95%CI 0.750-0.819; I2=30.6%), GC 0.834 (0.781-0.887; I2=56.6%), PDAC 0.853 (0.785-0.921; I2=60.8%), HCC 0.809 (0.747-0.871; I2=70.3%), and LC 0.780 (0.738-0.822; I2=25.0%). Fusobacterium nucleatum was consistently enriched across CRC, GC, and PDAC, while Faecalibacterium prausnitzii and Akkermansia muciniphila were depleted in all five cancer types. Porphyromonas gingivalis showed the highest fold-change in PDAC (log{blacksquare}FC=+2.8). Risk of bias was moderate-to-high in all studies. ConclusionsGut microbiome profiling demonstrates good-to-excellent diagnostic accuracy (AUC 0.78-0.85) across five major cancer types. Shared cross-cancer biomarkers suggest common dysbiotic mechanisms amenable to pan-cancer screening. These findings support integration of microbiome signatures into multi-modal cancer detection platforms.

14

Cancer-Type Specific Prognostic Impact of Concurrent TP53 and KRAS Alterations: A Multi-Cohort Genomic Analysis

Pan, G.

2026-03-30 oncology 10.64898/2026.03.29.26349383 medRxiv

Top 0.2%

3.5%

Show abstract

Background: The tumor suppressor gene TP53 and the oncogene KRAS are among the most frequently altered core drivers in human malignancies. Although they cooperatively regulate critical biological processes, the prognostic impact of their co alterations remains poorly defined and exhibits striking inconsistency across different cancer types. Methods: We comprehensively analyzed genomic and clinical data from multi-cancer cohorts sourced from the cBioPortal database and The Cancer Genome Atlas (TCGA). Genetic alterations, including sequence variations and copy number alterations (CNAs), were classified for TP53 and KRAS. Patients were stratified into four subgroups based on individual or combined alteration status. Survival analyses were performed using Kaplan-Meier methods. Integrated multi-omics analyses were conducted to assess the relationship between genetic alterations and mRNA/protein expression, and to characterize co-occurring genetic events and their prognostic implications. Results: Patients harboring concurrent TP53 and KRAS alterations exhibited significantly shorter overall survival in pancreatic cancer, colorectal cancer, and ampullary carcinoma, but surprisingly demonstrated the longest survival in gastric cancer. Distinct KRAS mutation subtype distributions were observed across cancer types: G12D/G12V predominated in pancreatic and colorectal cancers, G12C in non small cell lung cancer, and G13D in gastric cancer, with copy number alterations representing a substantial proportion of KRAS alterations in gastric and lung cancers. Multi-omics analysis revealed a lack of concordance between genetic alterations and mRNA/protein expression, indicating that mutation status alone does not reliably reflect downstream molecular changes. Concurrent genetic events displayed striking cancer-type specificity: CDKN2A alterations frequently co-occurred with TP53/KRAS double alterations in pancreatic cancer and were associated with worse prognosis, whereas APC mutations co-occurred in colorectal cancer and correlated with improved survival. Integrated analysis further demonstrated that KRASaltered/TP53altered patients were highly enriched in pancreatic, colorectal, and lung cancers, each exhibiting unique background genomic landscapes. Conclusions: The prognostic significance of TP53 and KRAS alterations is profoundly cancer-type specific, driven by differences in mutation subtype distribution, copy number alteration patterns, co-occurring genetic events, and the discordance between genotype and functional expression. These findings challenge the simplistic view of dual-gene alterations as universal markers of poor prognosis and underscore the necessity of incorporating cancer-specific molecular contexts into prognostic models and precision oncology strategies.

15

Validation of Immunoscore for Prognostic Stratification in HPV-associated Oropharyngeal Cancer: An International Multicenter Study

Nguyen, D. H.; Majdi, A.; Marliot, F.; Houtart, V.; Kirilovsky, A.; Hijazi, A.; Fredriksen, T.; de Sousa Carvalho, N.; Bach, A.- S.; Gaultier, A.- L.; Fabiano, E.; Kreps, S.; Tartour, E.; Pere, H.; Veyer, D.; Blanchard, P.; Angell, H. K.; Pages, F.; Mirghani, H.; Galon, J.

2026-04-11 oncology 10.64898/2026.04.08.26350238 medRxiv

Top 0.2%

2.8%

Show abstract

BackgroundTreatment optimization in HPV-associated oropharyngeal cancer (OPSCC) remains challenging, as recent de-escalation trials have shown limited success. Current patient selection strategies based on smoking history and TNM classification are insufficient, highlighting the need for robust, standardized prognostic biomarkers. We report the first validation of the Immunoscore (IS) for prognostic stratification in HPV-associated OPSCC. Patients and methodsWe analyzed 191 HPV-associated (p16+ and HPV DNA/RNA+) OPSCC patients from an international multicenter cohort (2015-2024), comprising a French monocentric retrospective training cohort (N = 48) and three validation cohorts: French monocentric retrospective (N = 48), French multicenter prospective (N = 50), and US multicenter retrospective (N = 45). IS is a standardized digital pathology assay quantifying CD3lJ and CD8lJ densities in tumor cores and invasive margins, with cut-offs defined in the training cohort and validated across cohorts. Associations with disease-free survival (DFS), time to recurrence (TTR) and overall survival (OS) were assessed, alongside 3RNA-seq and sequential immunofluorescence profiling of immune composition. ResultsMedian age 65; 80% male; 74% smokers; 66% T1-2; 82% N0-1 (AJCC8th). IS-High patients demonstrated superior 3-year DFS in the training and validation cohorts 1-3 (all log-rank P < 0.05). Multivariable analysis identified IS-Low as the strongest independent risk factor for DFS (HR 9.03; 95% CI: 4.02-20.31; P < 0.001). The model combining IS with clinical factors showed higher predictive accuracy for DFS (C-index 0.82) than clinical variables alone (0.7; P < 0.0001). Similar findings were observed for TTR and OS. IS-High tumors showed markedly higher enrichment of lymphoid and myeloid immune cell populations, contrasting with immune-poor signatures in IS-Low tumors. ConclusionsIS is a robust biomarker that outperforms standard clinical variables in both prognostic and predictive accuracy. The enriched cytotoxic immune infiltrate in IS-High tumors explains favorable outcomes and supports their suitability for treatment de-escalation. Prospective validation is warranted.

16

Estimation of cancer cases in transgender and gender diverse people in England

Pasin, C.; Jackson, S. S.; Thynne, L.-E.; McWade, B.; Westerman, T.; Ball, R.; Kavanagh, J.; O'Callaghan, S.; Ring, K.; Orkin, C.; Berner, A. M.

2026-04-22 oncology 10.64898/2026.04.21.26351378 medRxiv

Top 0.2%

2.6%

Show abstract

ObjectivesTo estimate current, and 5- and 10-year projected, number of cases of cancer per year in transgender and gender diverse (TGD) people in England, overall and by tumour type, accounting for uptake of gender affirming care (GAC). DesignPopulation-based epidemiological modelling study using an age-stratified Monte Carlo simulations approach and the NORDPRED method for predictions. SettingModels estimating cancer case numbers for TGD people in England based on publicly available 2023 cancer surveillance data and survey-based 2025 GAC access, and predicted at 5 and 10 years hence. ParticipantsTGD people aged 15 years and above. Main outcome measuresPrimary cancer cases per year overall, by gender, age group, tumour type, and current and planned GAC. ResultsThe estimated TGD population size in England is 441547 (95% uncertainty interval (UI) 429207- 452890). Total cases per year of cancer in TGD people is expected to be 966 (95% UI 882-1069) excluding non-melanoma skin. Most cases are expected to occur in people aged 60-64. The top 5 expected cancers in TGD people are breast (19%, n = 187, 95% UI 149-241), colorectal (12%, n = 117, 95% UI 106-129), lung (11%, n = 108, 95% UI 96-122), melanoma (7.1%, n = 69, 95% UI 64-74) and urinary (6.2%, n = 60, 95% UI 54-67). Total cases of cancer in TGD people are estimated to be 1740 (95% UI 1584-1934) in 5 years and 2258 (95% UI 2066-2507) in 10 years (excluding non-melanoma skin). If TGD people were able to access their planned level of GAC, this would reduce these figures to 1555 (95% CI 1386-1766) and 2012 (95% CI 1797-2282) respectively. ConclusionsThis study provides prediction of cancer cases in TGD people in England, supporting the planning of service provision and training. This is vital, as with increasing disclosure, and long wait times for GAC, cancer cases in TGD people are predicted to increase. Summary BoxesO_ST_ABSWhat is already known on this topicC_ST_ABSThe annual number of cases of cancer in transgender and gender diverse (TGD) people in England is currently unknown as gender incongruence is not collected as part of the National Cancer Registration and Analysis Service. Some gender-affirming care (GAC) interventions are known to modulate cancer risk. Use of testosterone and chest reconstruction for transmasculine people is known to reduce their incidence of breast cancer compared to cisgender women. Use of oestradiol alongside medical or surgical androgen suppression has been shown to reduce the incidence of prostate cancer in transfeminine people while increasing their risk of breast cancer, compared to cisgender men. What this study addsThis study found that there are likely to be approximately 966 cases of cancer (excluding non-melanoma skin) in TGD people per year in the UK. Though total annual cases of cancer in TGD people are expected to be 2258 in 10 years, improved access to gender-affirming care could reduce total cases to 2012 (a 11% reduction). These figures provide additional justification for funding to improve access to GAC via the National Health Service (NHS), as well as for training on the oncological needs of this population.

17

Racioethnic Disparities in Risk of Cardiometabolic Risk Factors and Cardiovascular Disease among Women Treated for Breast Cancer: The Pathways Heart Study

Yao, S.; Zimbalist, A.; Sheng, H.; Fiorica, P.; Cheng, R.; Medicino, L.; Omilian, A.; Zhu, Q.; Roh, J.; Laurent, C.; Lee, V.; Ergas, I.; Iribarren, C.; Rana, J.; Nguyen-Huynh, M.; Rillamas-Sun, E.; Hershman, D.; Ambrosone, C.; Kushi, L.; Greenlee, H.; Kwan, M.

2026-04-24 epidemiology 10.64898/2026.04.23.26351612 medRxiv

Top 0.2%

2.1%

Show abstract

Background: Few studies have examined racioethnic disparities in cardiovascular disease (CVD) in women after breast cancer treatment, who are at higher risk due to cardiotoxic cancer treatment. Methods: Based on the Pathways Heart Study of women with a history of breast cancer, this analysis examines the association between cardiometabolic risk factors (hypertension, diabetes, and dyslipidemia) and CVD events with self-reported race and ethnicity, as well as genetic similarity. Multivariable logistic and Cox proportional hazards regression models were used to test race and ethnicity and genetic similarity with prevalent and incident cardiometabolic risk factors and CVD events. Results: Of the 4,071 patients in this analysis, non-Hispanic Black (NHB), Asian, and Hispanic women were more likely to have prevalent and incident diabetes than non-Hispanic White (NHW) women. Analysis of genetic similarity revealed results consistent with self-reported race and ethnicity. For CVD risk, NHB women were more likely to develop heart failure and cardiomyopathy than NHW women. In contrast, Hispanic women were at lower risk of any incident CVD, serious CVD, arrhythmia, heart failure or cardiomyopathy, and ischemic heart disease, which was consistent with the associations found with Native American ancestry. Conclusions: This is the largest multi-ethnic study of disparities in CVD health in breast cancer survivors, demonstrating corroborating findings between self-reported race and ethnicity and genetic similarity. The results highlight disparities in cardiometabolic risk factors and CVD among breast cancer survivors that warrant more research and clinical attention in these distinct, high-risk populations.

18

Cross-Tabulating Epidemiological Covariates with AUDIT-C Data in Large-Scale Biobanks

Blackburn, A.

2026-04-03 epidemiology 10.64898/2026.04.01.26349975 medRxiv

Top 0.3%

1.8%

Show abstract

Introduction: The Alcohol Use Disorders Identification Test-Consumption (AUDIT-C) is a widely utilized screening tool in large-scale electronic health record (EHR) biobanks. However, its categorical, range-based survey responses present a significant challenge for epidemiological research, especially where continuous quantitative variables may be preferred. Standard workarounds, such as assigning categorical midpoints or utilizing aggregate ordinal scores for regression mapping often introduce false mathematical precision or obscure critical behavioral nuances between drinking frequency and quantity. This report presents a novel framework for presenting and bounding categorical alcohol survey data. Materials and Methods: I developed two complementary descriptive techniques: (1) a two-dimensional cross-tabulation matrix that preserves the interaction between drinking frequency and typical quantity, and (2) a systematic bounding algorithm that applies time-interval correction factors to calculate strict lower and upper estimates of average daily alcohol consumption. To demonstrate the real-world utility of this framework, I applied these methods to three analytical descriptive scenarios within a European ancestry (EUR) cohort of the All of Us Research Program: Generalized Anxiety Disorder (GAD) prevalence (n=104,893), minor allele frequency (MAF) for the rs1229984 genetic variant (n=104,890), and self-reported active duty military service history (n=104,893). Results: Application of the cross-tabulation matrix revealed patterns across all three descriptive scenarios. For example, participants reporting the highest frequency ("4 or more times a week") combined with the highest quantity ("10 or More" drinks) demonstrated a GAD prevalence of 13.5%, compared to 5.8% among those reporting the same frequency but a low quantity ("1 or 2" drinks). A general trend of increased anxiety in higher quantity drinkers contrasts with a general trend of decreased anxiety in higher frequency drinkers. Bounding estimates for average daily consumption ranged from 0.299 to 0.730 drinks for individuals with GAD, and 0.303 to 0.787 for those without. Those who reported having been active duty in the US Armed Forces demonstrated a general trend toward more frequent drinking and higher average daily consumption estimates (0.339 to 0.875) than those who had not (0.297 to 0.770). The minor allele of the genetic variant rs1229984 exhibited a clear effect reducing both frequency and quantity, resulting in lower average daily consumption estimates. Conclusions: This bounding and mapping framework provides researchers with an additional method to traditional midpoint and aggregate scoring methods. By explicitly defining the uncertainty inherent in categorical survey instruments and visualizing cohort distributions across intersecting behavioral axes, this methodology improves the resolution, reproducibility, and interpretability of lifestyle exposure data.

19

A Conversational Artificial Intelligence Framework for Comparative Pathway-Level Profiling of Sezary Syndrome and Primary Cutaneous CD8+ Aggressive Epidermotropic Cytotoxic T-Cell Lymphoma (PCAECTCL)

Diaz, F. C.; Waldrup, B.; Carranza, F. G.; Manjarrez, S.; Velazquez-Villarreal, E.

2026-04-17 oncology 10.64898/2026.04.15.26350992 medRxiv

Top 0.3%

1.7%

Show abstract

Background: Sezary syndrome (SS) is an aggressive leukemic variant of cutaneous T-cell lymphoma (CTCL) with distinct clinical and biological features compared to rarer entities such as primary cutaneous CD8+ aggressive epidermotropic cytotoxic T-cell lymphoma (PCAECTCL). Although recurrent genomic alterations in CTCL have been described, comparative analyses at the pathway level across biologically divergent subtypes remain limited. Here, we leveraged a conversational artificial intelligence (AI) platform for precision oncology to enable rapid, integrative, and hypothesis-driven interrogation of publicly available genomic datasets. Methods: We conducted a secondary analysis of somatic mutation and clinical data from the Columbia University CTCL cohort accessed via cBioPortal. Cases were stratified into SS (n=26) and PCAECTCL (n=13). High-confidence coding variants were curated and mapped to biologically relevant signaling pathways and functional gene categories implicated in CTCL pathogenesis. Pathway-level mutation frequencies were compared using Chi-square or Fisher's exact tests, with effect sizes quantified as odds ratios. Tumor mutational burden (TMB) was compared using the Wilcoxon rank-sum test. Subtype-specific co-mutation patterns were evaluated using pairwise association analyses and visualized through oncoplots and network heatmaps. Conversational AI agents, AI-HOPE, were used to iteratively refine cohort definitions, prioritize pathway-level signals, and contextualize findings. Results: TMB was comparable between SS and PCAECTCL (p = 0.96), indicating no significant difference in global mutational load. In contrast, pathway-centric analyses revealed marked qualitative differences. SS demonstrated enrichment of alterations in epigenetic regulators, tumor suppressor and cell-cycle control pathways, NFAT signaling, and DNA damage response mechanisms, consistent with transcriptional dysregulation and immune modulation. PCAECTCL exhibited relatively higher frequencies of alterations involving epigenetic regulators and MAPK pathway signaling, suggesting distinct oncogenic dependencies. Co-mutation analysis revealed a more constrained and focused interaction landscape in SS, whereas PCAECTCL displayed broader and more heterogeneous co-mutation networks, indicative of divergent evolutionary trajectories. Notably, ERBB2 mutations were significantly enriched between subtypes (p = 0.031), highlighting a potential subtype-specific therapeutic vulnerability. Conclusions: This study demonstrates that SS is distinguished from PCAECTCL not by increased mutational burden but by distinct pathway-level architectures, particularly involving epigenetic regulation, immune signaling, and transcriptional control. These findings generate biologically grounded, testable hypotheses for subtype-specific therapeutic targeting and underscore the value of conversational AI as a scalable framework for accelerating discovery in translational cancer genomics.

20

Time to diagnosis among children and adolescents with cancer in Quebec, Canada: a population-based study

Mullen, C.; Barr, R. D.; Strumpf, E.; El-Zein, M.; Franco, E. L.; Malagon, T.

2026-04-13 epidemiology 10.64898/2026.04.09.26350491 medRxiv

Top 0.3%

1.7%

Show abstract

BackgroundTimely cancer diagnosis in children and adolescents is critical to improving outcomes, yet substantial variation in diagnostic intervals persists across cancer types and care settings. We aimed to quantify time to diagnosis and assess variations by patient, demographic, and system-level factors. MethodsWe conducted a retrospective population-based study of children and adolescents aged 0-19 years diagnosed with one of 12 common cancers between 2010 and 2022 in Quebec, Canada. The diagnostic interval was defined as the time from first cancer-related healthcare encounter to diagnosis. We calculated medians and interquartile ranges (IQR) overall and by cancer type and used multivariable quantile regression to identify factors associated with time to diagnosis at the 25th, 50th, and 75th percentiles. ResultsAmong 2,927 individuals with cancer, diagnostic intervals varied by cancer type and age. Median intervals were longest for carcinomas (100 days; IQR 33-192) and shortest for leukemias (8 days; IQR 3-44). Compared with children living in Montreal, living in regional areas and other large urban centres was associated with longer 50th and 75th percentiles of time to diagnosis for hepatic and central nervous system (CNS) tumours. Diagnostic intervals were shorter in the post-pandemic period (2020-2022) across several cancer sites, with CNS tumours showing reductions across all quantiles. InterpretationDiagnostic timeliness differed by cancer type, age, and rurality, but not by sex, material, or social deprivation. The shorter diagnostic intervals observed in the post-pandemic period suggest that pandemic-related changes in care pathways may have expedited diagnosis for some cancers.